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To analyze and project age-specific mortality or morbidity rates 
age-period-cohort (APC) models are very popular. Bayesian approa- 
ches facilitate estimation and improve predictions by assigning smoo- 
thing priors to age, period and cohort effects. Adjustments for overdis- 
persion are straightforward using additional random effects. When 
rates are further stratified, for example, by countries, multivariate 
APC models can be used, where differences of stratum-specific effects 
are interpretable as log relative risks. Here, we incorporate correlated 
stratum-specific smoothing priors and correlated overdispersion pa- 
rameters into the multivariate APC model, and use Markov chain 
Monte Carlo and integrated nested Laplace approximations for infer- 
ence. Compared to a model without correlation, the new approach 
may lead to more precise relative risk estimates, as shown in an ap- 
plication to chronic obstructive pulmonary disease mortality in three 
regions of England and Wales. Furthermore, the imputation of miss- 
ing data for one particular stratum may be improved, since the new 
approach takes advantage of the remaining strata if the correspond- 
ing observations are available there. This is shown in an application 
to female mortality in Denmark, Sweden and Norway from the 20th 
century, where we treat for each country in turn either the first or sec- 
ond half of the observations as missing and then impute the omitted 
data. The projections are compared to those obtained from a univari- 
ate APC model and an extended Lee-Carter demographic forecasting 
approach using the proper Dawid-Sebastiani scoring rule. 

1. Introduction. Most developed countries have national health registers 
to routinely collect demographic rates. Age-period-cohort (APC) models are 
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commonly used to analyze and project mortality or morbidity rates, in which 
effects related to the age of an individual, calendar time (period) and the 
generation (cohort) can reasonably be assumed to be present. When several 
of such register data sets are available, for example, for different countries, 
each data set could be analyzed separately by a univariate APC model. 
However, for comparable strata, similar unobservable factors are likely to act 
on the different time dimensions (age, period, cohort), so that a multivariate 
APC analysis may seem more appropriate [Hansell et al. (2003), Hansell 
(2004), Jacobsen et al. (2004), Riebler and Held (2010)]. 

A quirk of APC models is the obvious linear dependence of age, period 
and cohort effects leading to a well-known identifiability problem. Over the 
last decades several proposals, ranging from the specification of additional 
identifying restrictions to the definition of estimable functions, have been 
made to solve the identifiability problem; see, for example, Fienberg and 
Mason (1979), Osmond and Gardner (1982), Holford (1983), Robertson and 
Boyle (1986), Clayton and Schifflers (1987), Holford (1992), Fu (2000), Yang, 
Fu and Land (2004) or Kuang, Nielsen and Nielsen (2008). Provided that at 
least one set of age, period or cohort effects is forced to be identical across 
strata (which is often a plausible assumption), differences of stratum-specific 
effects in the multivariate APC model are identifiable without further iden- 
tifying restrictions. They can be interpreted as log relative risks, so that 
heterogeneous time trends, for example, across gender [Riebler et al. (2011)] 
or geographical regions [Hansell (2004), Riebler and Held (2010)], can be 
analyzed. 

Bayesian APC analyses have become very popular in the last years; see, 
for example, Nakamura (1986), Berzuini and Clayton (1994), Besag et al. 
(1995), Ogata et al. (2000), Knorr-Held and Rainer (2001), Bray, Brennan 
and Boffetta (2001), Bray (2002), Baker and Bray (2005), Schmid and Held 
(2007), Riebler and Held (2010). As effects adjacent in time are likely to be 
similar, smoothing priors are typically assumed for age, period and cohort ef- 
fects. Nakamura (1986) used first-order autoregressive priors, while Berzuini 
and Clayton (1994) and Besag et al. (1995) proposed to use second-order 
random walks. The second-order random walk is a discrete-time analogue of 
a cubic smoothing spline [Fahrmeir and Tutz (2001)]. This prior is defined 
on the identifiable second differences, a well-known measure of curvature, 
and penalizes deviations from a linear trend [Fienberg and Mason (1979), 
Clayton and Schifflers (1987)]. The degree of smoothness is controlled by an 
unknown smoothing parameter. Using smoothing priors, overfitting cohorts, 
which by design are sparsely represented, is avoided [Besag et al. (1995)]. 

When age group and period intervals are of different length, an additional 
identifiability problem may induce artificial cyclical patterns in the param- 
eter estimates of uni- and multivariate APC models; see Holford (2006) and 
Riebler and Held (2010), respectively. However, this problem can be solved 
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by applying smoothing functions, such as second-order random walks or 
penalized splines [Holford (2006)]. The assumed smoothness of age, period 
and cohort effects can also be exploited for providing projections, as the 
effects can easily be extrapolated into both the future and past [Knorr-Held 
and Rainer (2001), Bray (2002)]. In a hierarchical Bayesian model, addi- 
tional random effects can be included to account for heterogeneity without 
temporal structure [Berzuini and Clayton (1994), Besag et al. (1995)]. 

Riebler and Held (2010) assumed independent smoothing priors for stra- 
tum-specific effects in multivariate APC models. Here, we propose to link 
stratum-specific smoothing priors. The new approach leads to a multivariate 
correlated random walk, where the joint precision matrix is defined as the 
Kronecker product of the inverse of a uniform correlation matrix and the 
precision matrix of the univariate second random walk. Inference is done 
using Markov chain Monte Carlo (MCMC) and integrated nested Laplace 
approximations [Rue, Martino and Chopin (2009)]. 

The new specification can be regarded as shrinking the stratum-specific 
parameters toward some common trend. Indeed, an alternative model for- 
mulation would introduce a common period effect, say, modeled via a second 
order random walk and additionally independent second order random walks 
for each stratum. While this formulation has two variance parameters, it in 
fact induces correlation between the stratum-specific increments, which are 
defined as the sum of the common innovation and the stratum-specific inno- 
vations, so that it can be translated into a multivariate random walk with 
one variance and one correlation parameter. 

In time series analysis, the use of multivariate random walks plays a fun- 
damental role in multivariate modeling [Harvey (1990)]. The multivariate 
random walk is an example of an intrinsic multivariate Gaussian Markov ran- 
dom field (GMRF) model [Rue and Held (2005)]. Multivariate GMRF mod- 
els with conditional autoregressive (CAR) structure are sometimes called 
multivariate CAR (MCAR) models; see, for example, Gelfand and Vounatsou 
(2003) or Carlin and Banerjee (2003). Proper multivariate GMRF models 
have been introduced by Mardia (1988). Greco and Trivisano (2009) applied 
MCAR models to handle general forms of spatial dependence occurring in 
multivariate spatial modeling of area data. Lagazio, Biggeri and Dreassi 
(2003) and Schmid and Held (2004) used Kronecker product precision ma- 
trices to model different types of space-time interactions in spatial APC 
models [Knorr-Held (2000)]. However, as far as we know, correlated second 
order random walks have never been used in multivariate APC models. 

We further propose the incorporation of correlated overdispersion param- 
eters to model unobserved risk factors without temporal structure but acting 
simultaneously on the different strata. The use of correlated overdispersion 
parameters is similar in spirit to seemingly unrelated regressions, where sin- 
gle regression equations are linked by correlated error terms [Harvey (1990)]. 
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Through the introduction of correlation in the prior distribution the effec- 
tive degrees of freedom are reduced whenever similar behavior in the different 
strata exists. Hence, the precision of relative risks may be improved. Fur- 
thermore, the approach is useful to predict missing records in one particular 
stratum if the corresponding data are available for the remaining strata. 
This might be the case for historical data if the collection of demographic 
rates started not at the same time in different strata. Consider, for example, 
Switzerland, where each canton (administrative unit) is separately respon- 
sible for the implementation of health-policy instruments, so that cancer 
is registered on a cantonal level [Ess et al. (2010), Bouchardy, Lutz and 
Kiihni (2011)]. The first Swiss cancer registration system started in 1970 
in the canton of Geneva followed by registers in the cantons of Vaud and 
Neuchatel in 1974 [Bouchardy, Lutz and Kiihni (2011)]. Compared to other 
cantons without explicit cancer registration, extensive cancer analyses have 
been performed for these cantons; see, for example, Levi et al. (1993, 1998, 
2002), Verkooijhen et al. (2003). Today most cantons have cancer registers 
and it is planned that within the next years the entire Swiss population will 
be captured by a cancer registration system [Bouchardy, Lutz and Kiihni 
(2011)]. Our method can be used to impute missing data for cantons with 
a younger cancer registration system taking advantage of other cantons with 
a longer collection period. Thus, important insight into cancer progression 
for all cantons could be gained. A different aspect might be varying collec- 
tion intervals in different regions, where in some regions data are collected 
on a yearly basis, say, and in other regions on a five-year basis. Here, the cor- 
related multivariate APC approach may be used to impute the rates for the 
missing years. In this paper, we demonstrate the ability to impute missing 
data units in a cross-prediction study of female mortality in Scandinavia. 

The paper is organized as follows. Section 2 introduces the two appli- 
cations presented in this paper. In Section 3 we review multivariate APC 
models and introduce our extended correlated approach (Section 3.1). Then 
we present details on the implementation (Section 3.2). In Section 4 we 
present the results of the two applications. Our findings are summarized in 
Section 5. 

2. Applications. 

2.1. Analysis of heterogeneous time trends in COPD mortality among 
males in England and Wales. We reanalyze male mortality data on chronic 
obstructive pulmonary disease (COPD) in three regions of England and 
Wales: Greater London, conurbations excluding Greater London and rural 
areas (nonconurbations). COPD is one of the most common lung diseases 
making it hard to breathe as a consequence of limited air flow. One of the 
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main causes of COPD is smoking, but also air pollution, smog, dust and che- 
mical fumes are relevant risk factors. While smoking exerts mainly long-term 
effects with a lag period of about 20-30 years [Kazerouni et al. (2004)], air 
pollution can cause both long-term (period or cohort) effects and short- 
term (period) effects [Sunyer (2001), Dockery and Pope (1994)]. We focus 
on short-term effects and the relation between marked air pollution events 
and changes in COPD mortality. For all regions data are available on an 
annual basis from 1950-1999 for seven age groups: 15-24, 25-34, . . . , 75+ 
[Hansell et al. (2003), Hansell (2004)]. Riebler and Held (2010) analyzed 
heterogeneous time trends in these data using an uncorrelated multivariate 
APC model with common age effects. We will compare their results with 
those obtained from a model with correlated stratum-specific period, cohort 
and over dispersion parameters. 

2.2. Extrapolation of overall mortality of Scandinavian females. All data 
were obtained from the Human Mortality Database (2011). The number of 
deaths are stratified by 5-year groups, for all Norwegian, Danish and Swedish 
women aged 0-84 in the period 1900-1999, leading to 17 age groups (0-4, 
5-9, . . . ,80-84) and 20 periods (1900-1904, . . . , 1995-1999). Figure 1 shows 
the death rates per 1,000 person-years for all three countries stratified by 
5-year age groups. To obtain person-years, we used the yearly population 
sizes available for the same age groups and based on the 1st of January. 
We used linear interpolation to get mid-year estimates and then added up 
the resulting quantities to obtain person-years for 1900-1904, . . . , 1995-1999. 
The rates of all three countries show a very similar progression. The peak 
in mortality in the 1915-1919 period, present, in particular, among young 
adults, is supposed to be related to the 1918-1919 Spanish flu pandemic 
which killed about 50 million people worldwide, with most deaths occurring 
among young adults [Andreasen, Viboud and Simonsen (2008)] . During the 
summer of 1918 there were strong influenza waves in Denmark, Sweden and 
Norway [Andreasen, Viboud and Simonsen (2008), Kolte et al. (2008)]. 

We divide the calendar period into two equally sized parts (see Figure 1). 
For either the first or second half of the 20th century, all observations from 
one particular country are treated as missing. The omitted data are then 
predicted exploiting the information provided by the complete data sets of 
the other two countries. This procedure is repeated for all countries and 
hence termed cross-prediction. Thus, we cannot only assess the ability to 
project particular events present for all countries, such as the Spanish-flu 
pattern, but also analyze the prediction quality for years without such spe- 
cific events. The probabilistic projections are compared to those obtained 
from a univariate APC model and to those from an extended Lee-Carter 
demographic forecasting model [Lee and Carter (1992)]. 

The data and code used in this application are provided in the Supplemen- 
tary Material [Riebler, Held and Rue (2011)]. 





Fig. 1. Female death rates per 1,000 person-years (pyrs) in Norway, Denmark and Swe- 
den by age from 1900 to 1999. The vertical line divides the time into equally sized parts. 



3. The correlated multivariate APC model. Let denote the num- 
ber of deaths observed for age group i (i = 1, . . . , I), period j (j = 1, . . . , J) 
and stratum r (r = 1, . . . ,R). In both of our applications r represents a ge- 
ographical region (either a region in England and Wales or a Scandinavian 
country). Deaths can be regarded as events arising from a Poisson process. 
Hence, yij r can be interpreted as the number of events that have occurred 
during an exposure period of rij,y person-years, in which the occurrence 
rate is assumed to be Xij r p6r person-y6cir. Thus, yijr is Poisson distributed 
with rate riij r Xij r , where riij r is known [Armitage (1966), Bollinger (1986)]. 
In the most general formulation of the multivariate APC model, the linear 
predictor is 

Vijr = log(Ajj r ) =fJr + Oir + fjr + Ipkr- 

Here, [i T is the stratum-specific intercept, and 0j r , (fj r and ipk r are stratum- 
specific age, period and cohort effects, respectively. The cohort index k is 
a linear function of the age index i and the period index j. If the time interval 
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widths of age group and period are equal, then k = (I— i) + j . If age group in- 
tervals are M times wider than period intervals, as is the case in the first ap- 
plication (Section 2.1) where M = 10, then k = Mx(I-i)+j [Heuer (1997)]. 



We apply the usual constraints, Yll=i @ir = Ylj=i fir = Sfc=i ^kr = for 
r = 1, . . . , R, to ensure identifiability of the stratum-specific intercepts. How- 
ever, parameter estimates are still not identifiable without imposing addi- 
tional constraints [Fienberg and Mason (1979), Holford (1983)]. In contrast, 
second differences of parameter estimates, for example, 9i r — 29i-\ r + 6i-2r, 
are not affected by the identifiability problem and can be uniquely deter- 
mined [Fienberg and Mason (1979), Clayton and Schimers (1987)]. Further- 
more, stratum-specific differences, for example, 9i ri — 9{ r2 with t\ ^ r2, are 
identifiable (absent an additional constraint), provided that at least one of 
the three time effects (age, period, cohort) is common across strata [Riebler 
and Held (2010)]. 



3.1. Bayesian inference. In a Bayesian context, we work with a hier- 
archical model in which prior distributions need to be assigned to all pa- 
rameters. We use independent flat priors for each stratum-specific inter- 
cept fi r . Riebler and Held (2010) assigned independent smoothing priors to 
the age effects 9 = (9±, . . . ,9j) T , each stratum-specific set of period effects 
ip r = (fir, ■ ■ ■ , 9?jr) T and cohort effects tp r = (V'lr, ■ ■ ■ , ipKr) 1 ^ , r = 1, ... ,12, 
in a model with common age effects. Consider, for example, the period ef- 
fects for a specific stratum r. The random walk of second order (RW2) is 
a smoothing prior based on second differences and penalizes deviations from 
a linear trend. This improper prior can be written as 

(J \ 

f(<P, 



|Ooc/4 J 2)/2 exp| 



J'=3 



4 J - 2)/2 ex P ( 



with precision matrix P^, 



which depends on an unknown precision param- 
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Here, we propose the use of correlated smoothing priors for stratum- 
specific time effects. Let C = (1 — p)I + pj denote a uniform correlation 
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matrix, where p is the unknown correlation parameter, I the identity ma- 
trix and J a matrix of ones. The random walks of the stratum-specific 
period effects <p x , . . . ,<p R can be correlated using the stacked vector ip = 

= ic-r- 2)/2 • (ip,r)^ 2 ex P (-i^{c- 1 ® 

where <8> denotes the Kronecker product and | • |* the generalized determi- 
nant defined as the product of all nonzero eigenvalues. The determinant 
of C~ l is [(1 + (R — l)p ¥ ,)(l — Pip) R ~ l ]~ l ; see the proof in Appendix A. This 
formulation corresponds to a multivariate RW2 with correlated increments 
and is an example for an improper (intrinsic) correlated GMRF [Gelfand 
and Vounatsou (2003), Rue and Held (2005)]. 

To adjust for heterogeneity, which has no temporal structure but is likely 
to exist in the underlying rates, we introduce stratum-specific latent ran- 
dom effects Zijr into the linear predictor. These over dispersion parameters 
are typically assumed to be independent Gaussian variables with mean zero 

and unknown variance kJ 1 , that is, £j,> 1 ~' Af(0, ftj 1 ) [Berzuini and Clayton 
(1994), Besag et al. (1995)]. However, when interpreting these latent effects 
as unobserved covariates, it may be plausible that they act partly simulta- 
neously on the different strata. Hence, we propose correlated overdispersion 
parameters and set Zy = (-Zjji, . . . , %r) t ~ A/"(0, n~ 1 C z ) for all i and j. 

All of the up to eight hyperparameters (four precisions and up to four cor- 
relations) are treated as unknown. Suitable gamma-hyperpriors are assigned 
to the precisions. As in Knorr-Held and Rainer (2001), we use Ga(l, 0.00005) 
for the precisions of age, period and cohort effects and Ga(l, 0.005) for the 
precision of the overdispersion. 

Correlation parameters p are reparameterized using the general Fisher's 
z-transformation [Fisher (1958), page 219]: 



exp(p*) + R-V V 1-P 

where p* can take any real value. It is worth noting that this transformation 
ensures that p only takes values within the interval (—1/(R — 1), 1), so that 
C is positive definite without imposing an additional constraint. Using R = 2 
in (1), we obtain 

exp(p*) - 1 * fl+p 

P= rrr~T i P =log' 



exp(p*) + 1 ' \1 — p 

which is frequently used for constructing confidence intervals for p [Konishi 
(1985)]. Fisher's z-transformation is a variance stabilizing transformation. 
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Fig. 2. Prior distribution for correlation parameter p derived from a zero-mean Gaussian 
distribution for p* with three different values for the precision n p * (top to bottom: 0.2, 0.4, 
0.8,) and three different numbers of strata R (left to right: R = 2, R — 3, R = 4). 



In a Bayesian context this transformation is of particular interest since the 
derivative of a variance stabilizing transformation corresponds to Jeffreys' 
prior for the original parameter [Lehmann (1999), pages 491 and 492]. For 
example, for R = 2, Jeffreys' prior is vr(p) oc 1/(1 — p 2 ), the derivative of 
log(i^) [Lindley (1965), pages 215-220]. 

We assign a normal prior with mean zero and fixed precision k p * to p*. 
Thus, the prior probability that p is larger than zero is equal to 0.5, inde- 
pendent of R. Figure 2 shows the resulting prior for p for three different 
values of k p * and three different values of R. For R = 2 strata, setting k p * 
to 0.2 corresponds to a U-shaped prior, n p * = 0.4 to a roughly uniform prior 
and K p * = 0.8 to a bump-shaped prior for p; compare the first column of 
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Figure 2. Note that k p * = corresponds to the improper Jeffreys' prior. For 
a larger number of strata, the left boundary for the correlation is shifted 
toward zero, resulting in an asymmetric prior distribution for p, since half 
of the total density is distributed to a smaller interval, (— 1/(R — 1),0). We 
use k p * = 0.2, so that sufficient probability mass is assigned to the boundary 
values as well, making extreme posterior correlation estimates possible. 

3.2. Implementation. Bayesian inference for the models presented is not 
straightforward, since the posterior distribution is not analytically available. 
The common tool of choice is MCMC sampling. An alternative is integrated 
nested Laplace approximations (INLAs). To compare these two inference 
techniques, we implemented correlated multivariate APC models using both 
MCMC and INLA. In the first application, we apply INLA and MCMC to 
show the almost perfect coincidence of both approaches. Due to the com- 
plexity of the second application resulting in large thinning intervals and 
burn- in periods, we only present the results of INLA. 

3.2.1. Analysis with MCMC. Algorithmic routines based on MCMC are 
implemented in the low-level programming language C using the GMRFLib li- 
brary [Rue and Held (2005)]. Following Besag et al. (1995), we reparameter- 
ize the model from Zij r to rjij r to obtain multivariate normal full conditional 
distributions for the stratum-specific intercepts = (/xi, . . . , /i r ) T and all sets 
of time effects. Block updating allows the proper incorporation of the sum- 
to-zero constraints for the time effects. It is also possible to omit the sum-to- 
zero constraint for one set of stratum-specific effects and simultaneously re- 
move the stratum-specific intercepts /x from the algorithm. For the precisions 
Gibbs sampling is used as well. The vector rf^ = (fftjij . . . , r)ijR) T has a non- 
standard distribution. It is updated using multivariate Metropolis-Hastings 
steps with a GMRF proposal distribution based on a second-order Taylor 
approximation of the log likelihood [Rue and Held (2005), Section 4.4.1]. For 
the correlation parameters Metropolis-Hastings updates based on a random 
walk proposal are used, such that acceptance rates around 40% are achieved. 
In the application to COPD mortality we use a MCMC run of 350,000 iter- 
ations, discarding the first 50,000 iterations and storing every 20th sample 
thereafter, resulting in 15,000 samples. We have routinely examined conver- 
gence and mixing diagnostics. 

3.2.2. Analysis with INLA. Rue, Martino and Chopin (2009) proposed 
with INLA an alternative deterministic Bayesian inference approach for la- 
tent Gaussian random field models. INLA replaces time-consuming MCMC 
sampling with fast and accurate approximations to the posterior marginal 
distributions. Some empirical comparison with MCMC results can be found 
in Rue, Martino and Chopin (2009), Paul et al. (2010) or Schrodle et al. 
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(2011). We incorporated correlated GMRF models into INLA, enabling the 
analysis of correlated multivariate APC models based on a uniform correla- 
tion structure and using the general Fisher's ^-transformation. The method- 
ology is integrated in the package INLA (see www.r-inla.org) for R [R Devel- 
opment Core Team (2010)]. For both applications, we use the INLA package 
built on 14.03.2011. 

4. Results. 

4.1. COPD mortality among males in England and Wales. We compared 
the uncorrelated model with joint age-effects, and region-specific period and 
cohort effect presented by Riebler and Held (2010) with three different 
correlated formulations: (1) Region-specific period and region-specific co- 
hort effects are correlated; (2) the over dispersion parameters are correlated; 
(3) region-specific period effects, region-specific cohort effects and overdis- 
persion parameters are correlated. To make the models comparable, we used, 
in contrast to Riebler and Held (2010), the same precision for the indepen- 
dent priors of region-specific period effects and also the same precision for the 
independent priors of region-specific cohort effects. For all models MCMC 
and INLA produce virtually identical results; see Figure 3 for a comparison 
of precision and correlation estimates in model 3. The running time of INLA 
was always less than the computation time with MCMC. Inspecting the log 
marginal likelihood returned by INLA, the model with correlated period 
and cohort effects, and correlated over dispersion parameters, was classified 
as the best model. Despite the improper random walk prior, the log-marginal 
likelihood can be used here, as the models are based on the same underlying 
latent structure and only differ by the inclusion of correlation. Furthermore, 
the correlation estimates p v ,p^ and p z of model 3 (Figure 3) are clearly 
different from zero, confirming the between-region dependence. 

Figure 4 compares the estimates of average relative risks obtained from 
MCMC for the models with uncorrelated and correlated region-specific ef- 
fects and overdispersion parameters, respectively. The estimates are relative 
to nonconurbations, where the mortality rates tend to be the lowest. The 
results of both models are very similar. The average relative risk of period 
effects shows the typical year-to-year variation, with higher values in years 
of known air pollution events, such as the "Great Smog" in London in 1952. 
In the average relative risks of cohort effects, different smoking behavior 
may be visible. For a detailed interpretation of the relative risks we refer to 
Riebler and Held (2010). Due to fewer observations, the credible intervals 
are getting wider for younger birth cohorts. However, adjusting for corre- 
lation improves the precision of the relative risks estimates, in particular, 
for younger birth cohorts. The average posterior standard deviation in the 
correlated approach is about 20% and 25% smaller for the average relative 
risks of the period effects and cohort effects, respectively. 



12 



A. RIEBLER, L. HELD AND H. RUE 




0.5 0.7 0.9 0.0 0.4 0.8 0.2 0.4 0.6 0.8 

P<t> Py Pz 



Fig. 3. Approximated posterior marginal densities (solid red line) of precision and cor- 
relation parameters for the multivariate model with joint age effects and correlated period, 
cohort and overdispersion parameters obtained from INLA. Moreover, the corresponding 
histograms of 15,000 MCMC samples obtained from a run with 50,000 burn-in iterations 
and a thinning of 20 are shown. 

4.2. Extrapolation of overall mortality of Scandinavian females. We will 
first briefly introduce the basic and the extended Lee-Carter model consid- 
ered. Then we will present the results of the predictive model assessment 
and compare the projections obtained by the different approaches. 

4.2.1. The quasi-Poisson version of the Lee-Carter model. The Lee- 
Carter model, introduced by Lee and Carter (1992) to forecast mortality 
in the U.S., is one of the best-known methods for mortality forecasting and 
often used as a reference [Booth (2006), Booth et al. (2006)]. It assumes 
a log-bilinear form 

log — = Cli + faltj + Eij , 

no- 
where cxi describes the average shape of the age profile, fa the age-specific 
mortality change from this pattern with time- varying trend Kj, and 
are homoskedastic centered error terms. The parameters are constrained 
to Ylj K j = an d Yli ft = 1 ■ Forecasting using this model proceeds in two 
steps: (1) the model coefficients are estimated; (2) the time trend Kj is ex- 
trapolated based on an ARIMA(0,1,0) time-series model, that is, a random 
walk with drift. This forecasted trend is used to derive the projected age- 
specific mortality rates based on the estimates for ol% and fa from step 1. 
Note that only the uncertainty in the time trend Kj is taken into account in 
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Fig. 4. Average relative risk of death for Greater London and conurbations excluding 
Greater London compared with nonconurbations, analyzed by a multivariate model with 
joint age effects and no correlations across other parameters (top), and a multivariate 
model with joint age effects and correlated period, cohort and overdispersion parameters 
(bottom). Shown are the median estimates within 95% pointwise credible bands. 



the projected rates, so that not all variability is captured [Lee and Carter 
(1992), Butt and Haberman (2009)]. 

The Lee-Carter model was further developed and embedded in a quasi- 
Poisson regression model by Brouhns, Denuit and Vermunt (2002). We used 
the ilc-package [Butt and Haberman (2009)] in R to generate univariate pre- 
dictions for the country under consideration based on this extended model. 
Since the implementation does not allow to project into the past, we reversed 
the time-scale when predicting data of the first half of the 20th century. 

4.2.2. Predictive model assessment. Each of the three models (Lee-Carter, 
univariate APC and multivariate APC; with the latter two abbreviated as 
APC and cMAPC, resp.) generates, for each of the six scenarios of the 
cross-prediction procedure, 10 x 17 = 170 probabilistic forecasts for coun- 
try r* under consideration. We used the mean squared error score to assess 
the concentration of the predictive distribution (sharpness). To assess the 
statistical consistency between the distributional forecasts and the observa- 
tions (calibration), we calculated prediction intervals at various levels and 
computed the empirical coverage probabilities, that is, the proportion of the 
prediction intervals that cover the observed number of cases. To combine 
sharpness and calibration in one measure, we further report the Dawid- 
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Sebastiani scoring rule (DSS) defined as 

DSS= ( ^*-/%>A 2 + 21og( ); 

where yij r * is the observation that realizes, fiij r * the mean and Oij r * the stan- 
dard deviation of the predictive distribution [Gneiting and Raftery (2007)] . 
This score has been proposed as a proper alternative to the predictive model 
choice criterion of Gelfand and Ghosh (1998) and was also used by Czado, 
Gneiting and Held (2009) to assess the predictive quality of a univariate 
APC analysis applied to cancer incidence in Germany. To calculate these 
quantities, we need to post-process the results returned by INLA and the 
ilc-package. 

For the univariate and multivariate APC analysis, INLA returns posterior 
summary estimates and posterior marginal densities for the linear predic- 
tor i]ij r +, with i = 1, ... ,17 and j = 1, . . . , 10 or j = 11,..., 20, depending 
on whether we project the first or last half of the 20th century. The corre- 
sponding estimates for Xij r * are straightforward to derive. For the univariate 
Lee-Carter analysis of country r*, the ilc-package returns the predicted mor- 
tality rate Xij r * and the prediction intervals (symmetric on the log-scale) at 
a predefined level. 

We need the mean fiij r * = E(yij r *) of the predictive distribution for com- 
puting the DSS and the mean squared error score. Using the law of iterated 
expectations [Billingsley (1986), Theorem 34.4], be derived. With 

Vijr*Wjr* ~Po(n ij >* • Xij r *), it follows 

flijr* =E(E(yij r *\Xij r *)) =E(n ijr * ■ Xij r *) =riijr* •E(Xij r *). 

Analogously, the variance of jr ,* = Var(yj jr *) follows from the law of total 
variance as 

o}j V * = E(VaT(yij r *\Xij r *)) +Vax(E(y ijr *\Xij r *)) 

= E(mjr* ■ Xij r .*) + Var(n ijr * • X ijr *) 

= riijr* ■ E(X ijr *) + n? r * Var(Ajj r *) 

for INLA. Under a quasi- Poisson approach with \ax{yij r \Xij r ) — <f> ■ fiij r ■ 
Xijr we need to explicitly incorporate the over dispersion parameter (f>, so 
a fjr = 4> ' nij r E(Xij r ) + nfj r Var (Ajj r ) . Here, we used the total lack of fit as 0; 
compare Booth, Maindonald and Smith (2002). 

To obtain posterior predictive quantiles, the missing Poisson variation was 
added to the predicted mortality rates. When using INLA, this is done by 
numerical integration over the predicted posterior marginal of Xij r *- Since 
we do not obtain the posterior marginal of Xij r * using the ilc-package, we 
used Monte Carlo sampling instead. To be more precise, we generated iV = 
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_ Table 1 

Mean Dawid-Sebastiani score (DSS), mean squared error score (MSE), empirical 
coverage probabilities for all models predicting female mortality of one country either for 
the first or second half of the 20th century 





Measure 




i Qnn i Q4Q 






1 Q^fl — 1 QQQ 




Lee-Carter 


APC 


cMAPC 


Lee-Carter 


APC 


cMAPC 










NORWAY 










DSS 


232.4 


38.9 


17.9 


13.5 


16.8 


15.0 




MSE 


3.49e+06 


7.21e+06 


1.73e+06 


3.10e+06 


1.92e+07 


1.50e+07 


Level 


95% 


19 


66 


70 


91 


96 


78 




80% 


13 


52 


35 


79 


73 


54 




50% 


!) 


39 


11 


52 


35 


37 










DENMARK 








DSS 


50.5 


41.2 


14.7 


22.2 


17.1 


13.7 




MSE 


3.37e+06 


2.30e+07 


3.56e+06 


2.03e+07 


6.43e+07 


1.80e+07 


Level 


95% 


24 


58 


96 


66 


95 


99 




80% 


17 


50 


93 


51 


92 


8:5 




50% 


6 


15 


79 


30 


64 


65 










SWEDEN 








DSS 


249.6 


43.4 


15.4 


24.7 


18.5 


13.6 




MSE 


4.45e+07 


9.48e+07 


9.04e+06 


9.43e+07 


2.04e+08 


3.05e+06 


Level 


95% 


11 


64 


99 


70 


97 


95 




80% 


8 


28 


94 


62 


95 


72 




50% 


4 


8 


55 


41 


75 


54 



100,000 samples for the linear predictor rjij r * from a normal distribution 
with mean log(Ajj r *) and variance derived from the symmetric prediction 

(s) 

intervals on log-scale. Then, we generated for each sample t]\j r * , s = 1, . . . , N, 

(s) 

one sample from a negative binomial distribution with density 

T(y + d) f d W m 
ny> T{d)T{y + l)\m + d) \m + d 

where E(y) = m and Var(y) = m(l + m/d). To match the mean and vari- 
ance of the quasi- Poisson distribution, we set = riij r * exp(^^) and 

d( s ) = m ( s ) j^rj) _ f or each sample rjf^. Subsequently, quantiles at differ- 
ent prediction levels could be extracted from the samples. 

Table 1 shows for all models the mean squared error (MSE) score, the 
empirical coverage probabilities and the mean DSS averaged over all 170 
projections. For five of the six scenarios and especially when predicting the 
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Table 2 

Median and 95% credible interval for all correlation parameters in the correlated 

multivariate APC model 



Predicted 


Predicted 










period 


country 


Age 


Period 


Cohort 


Overdispersion 


1900-1949 


Norway 


0. 9780. 991o. 996 


0.8l0.96o.99 


0.57O.820.94 


0.9l0.93o.94 




Denmark 


0. 9940. 998o. 999 


0.350.87(3.99 


0.440. 760.92 


0.830.86o.89 




Sweden 


0. 9840. 994q. 997 


0.7C)0.95o.99 


0.580.84o.95 


0.750.79o.84 


1950-1999 


Norway 


0. 9690. 986o. 994 


G-.OlO.51o. 85 


0.280.62o.84 


0.9o0.92o.94 




Denmark 


0. 9640. 984o. 994 


0.180.75o.96 


0.890.97o.99 


0.820.85o.88 




Sweden 


0. 9760. 990o. 996 


0.620.96l.o0 


0.6o0.83o.94 


0.770.81o.84 



first 10 periods, the correlated multivariate APC model is clearly the best 
model. Although the prediction intervals are sometimes too large, as indi- 
cated by larger empirical coverage probabilities than the nominal level, the 
empirical coverage is mostly closer to the nominal level than for the other 
two approaches. Regarding mean DSS and empirical coverage, the univari- 
ate APC model also performs mostly better than the extended Lee-Carter 
approach. In particular, predicting the first half of the 20th century, the ex- 
tended Lee-Carter approach showed severe deficits. It was classified as the 
best model regarding all predictive assessment criteria only when predicting 
the second half of the 20th century for Norway. Inspecting the posterior cor- 
relation estimates of the cMAPC model (Table 2), we observe that for this 
scenario the posterior correlations between country-specific period effects 
and also between country-specific cohort effects are lower than in the other 
scenarios. In contrast, the correlation between country-specific overdisper- 
sion parameters is quite high. 

To compare the performance change from short-term to long-term fore- 
casts, Figure 5 shows the cumulative average DSSj, where DSSj denotes the 
mean DSS across age group at period j. Except for predicting death rates 
in Norway from 1950-1999, the curve for the cMAPC model is always below 
those of the two univariate approaches. Predicting the periods in the first 
half of the 20th century, the cumulative average DSSj of the extended Lee- 
Carter model is constantly increasing, indicating a lower projection quality 
with increasing time. The largest jump occurs for the period 1915-1919 with 
the Spanish flu. In contrast, the score of the univariate APC model decreases 
when predicting more periods, while the score for the correlated multivariate 
APC model stays fairly constant. Predicting the periods in the second half 
of the 20th century, the cumulative average DSSj slowly increases for all 
models. However, except for Norway, the cumulative score of the extended 
Lee-Carter model shows larger jumps from one period to the next. 
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Fig. 5. Cumulative average of mean Dawid-Sebastiani scores across age groups. 



4.2.3. Projections. The median projected death rates per 1,000 person- 
years together with 80% pointwise prediction intervals for Norwegian women 
obtained from all three models are shown in Figure 6 for the first half and 
Figure 7 for the second half of the 20th century. Furthermore, the true 
death rates of Norwegian women are added to the figures. For all models 
and especially for the univariate APC model, the prediction intervals are 
getting wider as prediction time goes on. While the projections of the two 
univariate approaches are almost straight lines, different temporal patterns 
across age groups can be seen for the correlated model. The Spanish flu was 
especially well captured by the correlated approach. Since the Spanish flu 
did not affect all age groups, this event (also present for Denmark and Swe- 
den) was captured and transferred to the projections due to the correlated 
over dispersion parameters and not because of the correlated period effects, 
as one might intuitively guess. Predicting the second half of the 20th cen- 
tury, the projections of the extended Lee-Carter model agree very well with 
the true observations. In particular, the rates for ages over 60 are well pro- 
jected, where the cMAPC model tends to underestimate the overall death 
rate. 

Projections for Danish and Swedish women are shown in Figures 8, 9, 
10 and 11 in Appendix B. Here, the projections of the correlated approach 
coincide for all scenarios and all age groups very well with the observed 
rates. In contrast, the extended Lee-Carter approach tends to underesti- 
mate rates for younger age groups when predicting the first half, and to 




Fig. 6. Median predicted mortality rates per 1,000 person-years (pyrs) of Norwegian women within 80% CI regions for the years 
1900-1949. Shown are the results from an extended Lee-Carter model (orange lines), a univariate APC model (dark blue lines) and 
a multivariate APC model with age, period, cohort and overdispersion parameters correlated across regions (light blue shaded). In addition, 
the true mortality rates for Norwegian women (red a) are shown. 




Fig. 7. Median predicted mortality rates per 1,000 person-years (pyrs) of Norwegian women within 80% CI regions for the years 
1950-1999. Shown are the results from an extended Lee-Carter model (orange lines), a univariate APC model (dark blue lines) and 
a multivariate APC model with age, period, cohort and overdispersion parameters correlated across regions (light blue shaded). In addition, 
the true mortality rates for Norwegian women (red o) are shown. 
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overestimate rates for older age groups when predicting the second half of 
the 20th century. 

5. Discussion. In this paper we proposed the use of correlated smoothing 
priors and correlated overdispersion parameters for multivariate Bayesian 
APC approaches analyzing mortality or morbidity rates stratified by age, pe- 
riod, cohort and one further variate. The specification of correlated smooth- 
ing priors involves a Kronecker product precision structure for the outcome- 
specific time effects, that is, age, period and/or cohort effects. We imple- 
mented correlated multivariate APC models based on a uniform correlation 
structure in MCMC and INLA. In the first application we analyzed COPD 
mortality among males in England and Wales using MCMC and INLA, and 
compared the results of an ordinary multivariate APC model with those 
obtained from different correlated model formulations. A comparison of 
MCMC and INLA showed virtually identical results. As indicated by the 
log marginal likelihood, the formulation with both correlated overdispersion 
and correlated stratum-specific period and cohort effects was classified as the 
best. As shown in the relative risk estimates, the correlated model structure 
improved the precision of the relative risk estimates especially for younger 
birth cohorts. 

In a second application on overall mortality of Danish, Swedish and Nor- 
wegian women in the 20th century, we performed a cross-prediction study. 
We illustrated the good predictive quality of the correlated approach when 
imputing missing data units for one country if these units are available 
for the other countries. As focus was set on projections, which are an es- 
timable function in (multivariate) APC models, we were able to consider the 
most flexible model with country-specific age, period, cohort and overdis- 
persion parameters that were all correlated across countries. In total, we 
considered six scenarios treating in turn for a particular country either the 
first or second half of the data as missing and subsequently predicting the 
omitted data units. We compared the projections to those obtained from 
a univariate APC model and a Lee-Carter approach embedded into a quasi- 
Poisson model using the proper Dawid-Sebastiani scoring rule. Since only 
the correlated formulation can take advantage from the complete tables of 
the remaining two countries, it was classified as the best model in five of the 
six scenarios. Furthermore, we observed that the predictive quality stayed 
almost constant when increasing the number of periods to predict, which 
was not the case for the two univariate approaches. Thus, the correlated ap- 
proach outperformed both univariate methods in short and long-term pro- 
jections. 

In real life, long-term projections of mortality or disease rates into the 
future are difficult to make using our proposed approach, since there will be 
no data from comparable time-series available. For short-term predictions, 
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data for some strata may be already available, while for others they are still 
missing. Here, the correlation approach will be useful to forecast the missing 
units. 

For the simultaneous projection of several strata, Li and Lee (2005) ex- 
tended the original log-linear Lee-Carter model. In the simplest extension, 
they assigned each stratum its own age pattern, while assuming shared age- 
specific patterns of mortality change and a shared time trend. Future values 
are then predicted for the shared time trend based on an ARIMA process. In- 
corporating both shared and stratum-specific parameters, this model seems 
to be similar in spirit to an uncorrelated multivariate APC model [Riebler 
and Held (2010)]. In a more complex model, Li and Lee (2005) included an 
additional stratum-specific bilinear term to allow for differences between the 
rate of change in mortality in a particular stratum and the rate of change im- 
plied by the common bilinear term. However, in contrast to the correlated 
approach presented here, those extensions cannot take advantage of data 
units missing in one stratum but present for the remaining. By contrast, our 
approach could be equally used to impute data for all strata simultaneously, 
benefitting from the periods where complete data existed. Furthermore, we 
can use stratum-specific effects for all parameters. Information from the 
remaining strata is borrowed by incorporating correlation. In a Bayesian 
setting the inclusion of correlation between parameters is straightforward 
via the prior distributions, whereas in a frequentist setting this seems to be 
more complex. 

Another interesting field of application is similar in spirit to the inference 
on collapsed margins, proposed by Byers and Besag (2000). In the context 
of collapsed margins, complete data are available on several risk factors, but 
a subsequent analysis indicates that information on an additional variable 
is relevant. For this variable the numbers of persons at risk are available 
but not the numbers of cases. Byers and Besag (2000) propose a Bayesian 
approach to estimate the effect of the variable. In multivariate APC models 
it might be that multiple data sets are only available for a specific period in 
time, while, before and/or after this date, data only exist for the conjunc- 
tion of outcomes. A typical example could be Germany, which was formerly 
united, then separated and now united again. Using age-specific data on the 
population sizes from 1990 up to now for East and West Germany separately, 
it may be possible to project mortality rates for both individual parts, by 
exploiting the correlation present when they were divided. Thus, the ob- 
servations for the conjunction of both parts could be separated. However, 
further investigations are required to explore the applicability. 

The proposed methodology can only be applied to data stratified by one 
further variate. For analyzing mortality rates stratified by more than one fur- 
ther variate, a conditional approach using a multinomial logistic regression 
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model has been proposed [Held and Riebler (2011)]. However, the incorpo- 
ration of correlation has not yet been considered. 

A disadvantage of the proposed methodology might be that it is essentially 
additive in age, period and cohort, so that interactions between the time 
dimensions cannot be explicitly modeled. Currie, Durban and Eilers (2004) 
proposed two-dimensional smoothing to address this problem in the analysis 
of an individual registry data set. Biatat and Currie (2010) started to extend 
this work and proposed a model to compare various mortality tables by 
assuming a common two-dimensional P-spline surface and additional one- 
dimensional smoothing functions for age and period. 

In general, the use of a Kronecker product structure is a promising area for 
further research, as different correlation structures can easily be combined 
with different precision matrices. Based on the uniform correlation structure 
INLA can, by now, correlate a wide range of other GMRF models as compo- 
nents of more general additive regression models. Examples are as follows: 
nonparametric seasonal models, continuous-time random walks or models 
with a user specified precision matrix. However, the uniform correlation 
structure is rather restrictive and may only be plausible for a few outcomes. 
Future work encompasses the integration of other correlation structures, for 
example, depending on the distance between units, so that the approach can 
be extended to the space-time context, for example. Furthermore, we are in- 
vestigating the use of correlated two-dimensional smoothing priors in INLA 
to incorporate interactions between time-dimensions into the multivariate 
APC model. 



APPENDIX A: UNIFORM CORRELATION STRUCTURE 



Let C be an R x R correlation matrix with uniform correlation structure, 
so that C = (l-p)I + pJ: 

/I P ••• P\ 
p '■■ '-■ : 



P 



P 



P 

1/ 



where p is the correlation parameter, I denotes the R x R identity matrix 
and J an R x R matrix of ones. Then the inverse C _1 is given by 



fa 
b 
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(R-2)-p + l 

(p-i){(R-i)- P +iy 

p 

(p-i){(R-i). p + iy 

Proof. If C _1 C = 1, then C _1 is the inverse of C. For the diagonal 
elements of C _1 C it follows that 

(C- l C) {iA = a + (R-l)-b-p 

-(R-2)-p-l + (R-l)- p 2 
(p-l){(R- l)-p + l} 

-Rp + 2p-l + Rp 2 -p 2 _ 
~ Rp 2 - p 2 - Rp + p + p-l~ 
for all i = 1, . . . , R. For the nondiagonal elements, that is, i ^ j, we get 
{C~ l C) {i>j) = a ■ p + b+(R-2)-b- p 

= {-(R-2)-p-l}p + p+(R-2)-p 2 
(p-l){(R-l).p + l} 

-Rp 2 + 2p 2 - p + p + Rp 2 - 2p 2 _ 

(p-l){(R-l).p+l} ~ ■ □ 

The determinant |C _1 | is given by 

icr 1 ! = icr 1 = [(i + (R- i) P )(i - p)^ 1 ]" 1 - 

PROOF. We show that |C| = (1 + (R — l)p)(l - p) R ~ l , as the inverse 
case follows immediately. Remember that |C| = |I — pi + p3\. The identity 
matrix has R times the eigenvalue 1 . The matrix J has once the eigenvalue R 
and R — 1 times the eigenvalue 0. Since both matrices (I and J) share the 
same eigenvectors, the eigenvalues for C are (1 — p + p ■ R) and (1 — p) 
with multiplicity R — 1, so that the determinant of C, the product of the 
eigenvalues, is 

|C| = (1 -p + p ■ R)(l - p)*" 1 = (1 + (R- l)p)(l - p) R -\ □ 

APPENDIX B: PROJECTION FOR DENMARK AND SWEDEN 

The median projected death rates per 1,000 person-years together with 
80% pointwise prediction intervals for Danish and Swedish women obtained 
from all three models are shown in Figures 8 and 10 for the first half and 
Figures 9 and 11 for the second half of the 20th century. 
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Fig. 10. Same quantities as in Figure 6 but for Swedish women (red o). 




Fig. 11. Same quantities as in Figure 7 but for Swedish women (red o). 
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SUPPLEMENTARY MATERIAL 

Code repository for the cross-prediction study of overall mortality of 
Scandinavian women (DOI: 10.1214/11-AOAS498SUPP; .zip). This reposi- 
tory archives the data, R-code and results for the cross-prediction study of 
overall mortality of Scandinavian women presented in Section 4.2. In par- 
ticular, it contains code to make Table 1 and Figures 5-11. 
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